Hierarchical Text Classification with Latent Concepts

نویسندگان

  • Xipeng Qiu
  • Xuanjing Huang
  • Zhao Liu
  • Jinlong Zhou
چکیده

Recently, hierarchical text classification has become an active research topic. The essential idea is that the descendant classes can share the information of the ancestor classes in a predefined taxonomy. In this paper, we claim that each class has several latent concepts and its subclasses share information with these different concepts respectively. Then, we propose a variant Passive-Aggressive (PA) algorithm for hierarchical text classification with latent concepts. Experimental results show that the performance of our algorithm is competitive with the recently proposed hierarchical classification algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

A Hierarchical Approach to Encoding Medical Concepts for Clinical Notes

This paper proposes a hierarchical text categorization (TC) approach to encoding free-text clinical notes with ICD-9-CM codes. Preliminary experimental result on the 2007 Computational Medicine Challenge data shows a hierarchical TC system has achieved a microaveraged F1 value of 86.6, which is comparable to the performance of state-of-the-art flat classification systems.

متن کامل

Statistical modeling of medical indexing processes for biomedical knowledge information discovery from text

The overwhelming amount of published literature in the biomedical domain and the growing number of collaborations across scientific disciplines results in an increasing topical complexity of research articles. This represents an immense challenge for efficient biomedical knowledge discovery from text. We present a new graphical model, the socalled Topic-Concept Model, which extends the basic La...

متن کامل

Hierarchical Bayesian Mixed-Membership Models and Latent Pattern Discovery

Hierarchical Bayesian methods expanded markedly with the introduction of MCMC computation in the 1980s, and this was followed by the explosive growth of machine learning tools involving latent structure for clustering and classification. Nonetheless, model choice remains a major methodological issue, largely because competing models used in machine learning often have different parameterization...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011